开源项目地址:mozilla/DeepSpeech
百度论文地址:Scaling up end-to-end speech recognition
安装方法:
pip install deepspeech
使用方法:
deepspeech output_model.pb my_audio_file.wav alphabet.txt
说明文档:Welcome to DeepSpeech’s documentation!
项目说明原文:
Project DeepSpeech is an open source Speech-To-Text engine. It uses a model trained by machine learning techniques, based on Baidu’s Deep Speech research paper. Project DeepSpeech uses Google’s TensorFlow project to make the implementation easier.
Pre-built binaries that can be used for performing inference with a trained model can be installed with pip. Proper setup using virtual environment is recommended and you can find that documented below.
Once installed you can then use the deepspeech binary to do speech-to-text on an audio file:
pip install deepspeech
deepspeech output_model.pb my_audio_file.wav alphabet.txt
Alternatively, quicker inference (The realtime factor on a GeForce GTX 1070 is about 0.44.) can be performed using a supported NVIDIA GPU on Linux. (See the release notes to find which GPU’s are supported.) This is done by instead installing the GPU specific package:
pip install deepspeech-gpu
deepspeech output_model.pb my_audio_file.wav alphabet.txt
See the output of deepspeech -h for more information on the use of deepspeech. (If you experience problems running deepspeech, please check required runtime dependencies).
更多机器学习资源:TensorFlow 安装,TensorFlow 教程,TensorFlowNews 原创人工智能,机器学习,深度学习,神经网络,计算机视觉,自然语言处理项目分享。
机器学习 QQ 群:522785813
PyTorch QQ 群:518428276